<html>
<head>
<title>C++ for Java Programmers</title>
<!-- Created by: Barbara Lerner,  5-Sep-1998 -->
<!-- Changed by: Barbara Lerner, 11-Sep-1998 -->
</head>
<body bgcolor="white">
<center>
<h1>C++ for Java Programmers</h1>
<h2>Barbara Staudt Lerner<BR>
September 1998</h2>
</center>
C++ was developed in the early 1980's.  Its goal was to introduce
object-orientation to C while maintaining backwards compatibility so
that existing C programs would continue to work without change.  Java,
developed in the early 1990's inherited none of that baggage and
instead was intent on developing a pure object-oriented language with
syntactic similarities to C.<P>

Besides the desire for backwards compatibility, C++ also maintained
the philosophy that performance was critical.  Many design decisions
of C++ were done to allow programmers to get maximum performance out
of their programs at the cost of making the code more difficult to
understand and more error-prone.<P>

Java was originally developed as a programming language for programs
embedded in electronic devices, such as microwave ovens, CD players,
telephones, etc.  Since this was a new marketplace, there was not a
lot of legacy code that the Java designers needed to maintain
compatibility with.  Also, since the performance of architectures has
improved substantially and the computing demands of these devices were
not great, performance of the language was much less of a concern, but
productivity of the programmers was considered quite important.  <P>

The end result is that Java is much more programmer-friendly than C++.
James Gosling, the creator of Java, quips that Java is C++ without the guns,
knives, and clubs.  Fortunately for us, Nachos uses a restricted
subset of C++ that excludes many (but not all) of the hazardous features.
This document should give you a basic understanding of the
subset of C++ that Nachos uses.  This should help you read the
existing Nachos code.  You will certainly need to supplement this with
a C++ text to get all the details.  This document assumes that you
already know Java.<P> 

<h2>Differences among Features Common to C++ and Java</h2>
<h3>Low-level Syntax</h3>
At the level of statements
and declarations, C++ and Java are quite similar.  The syntax that you
have learned in Java will carry over (mostly, anyway) to C++.  C++
includes more constructs than Java, for which you will need to learn
both the syntax and semantics.<P>

The boolean data type consisting of the values <code>true</code> and
<code>false</code> is called <code>bool</code> in C++.<P>

In C++, the data type <code>char</code> is an 8 bit value capable of
representing an ASCII character.  An instance of the <code>char</code>
type can also be treated as an 8 bit integer.  Therefore, you can do
arithmetic on variables declared as <code>char</code>.<P>

In C++, you can declare any of the numeric data types to be
<code>unsigned</code>.  This results in the lower bound for the type
being 0.  For example, <code>int</code> normally ranges from
-2<sup>31</sup> to 2<sup>31</sup>.  An <code>unsigned int</code>
ranges from 0 to 2<sup>32</sup>.<P>

In C++, <code>=</code> is the assignment operator just as in Java.  In
C++, <code>=</code> can 
also be used as an expression that returns the value being assigned.
This allows the following convenient way of initializing two variables
to the same value:
<code><pre>
    a = b = 1;
</pre></code>
<code>1</code> is assigned to <code>b</code>.  The assignment
expression returns the value 
assigned and assigns this to <code>a</code>.<P>

C++ allows the condition controlling an if-statement or
while-statement to be an integer expression.  If the integer
expression evaluates to 0, this is treated the same as the boolean
value <code>false</code>.  A non-zero value is treated the same as
<code>true</code>.  This is bad programming style and should be
avoided, but you might run into this in the existing Nachos code.  It
would be better to say:
<code><pre>
    if (someInt != 0)
</pre></code> 
then to just say
<code><pre>
    if (someInt)
</pre></code> although they are semantically
equivalent.<P>

Combining these last two paragraphs shows one of the most common
syntactic errors in C++ programs:
<code><pre>
    int a = 0, b = 1;
    if (a = b) {
      ...
    }
</pre></code>
Assume that the programmer intended to say <code>a == b</code> as the
condition, which is almost certainly the case.  The programmer
therefore intended that the body of the if-statement would be executed
only if <code>a</code> and <code>b</code> had the same value.  In
Java, the code above would give a compilation error, but it does not
in C++ since <code>=</code> is an expression.  Instead the value of
<code>b</code> is assigned to <code>a</code>, so <code>a</code> now
becomes <code>1</code>.  This value (<code>1</code>) is returned as
the result of the assignment expression.  Since integer expressions
are allowed for conditions, this is ok.  <code>1</code> is treated as
true and the body of the if-statement is executed, which is not what
the programmer intended.  Be on the lookout for this simple error in
your code!<P> 

In C++, you must declare the size of an array when you declare the
array.  Memory is allocated for the array when the array
is declared:  
<code><pre>
    int intArray[10];
</pre></code>
<strong>Beware!  C++
does not check array 
bounds like Java does.  If you pass in a negative number for an array
bound or an array bound that is greater than the size of the array,
C++ will happily access some (seemingly random) piece of memory.  If
this appears on the left side of an assignment statement, it will
happily change some (seemingly random) piece of memory.  Always check
array bounds yourself if you are not absolutely certain that the value
is in the correct range!!!</strong><P>

In C++, it is possible to declare that a variable should be kept in a
register.  This is typically done to improve performance but is 
really unnecessary with modern compilers.  Compilers are smart about
recognizing which variables are used most often and keeping those
values in registers.  If not used extremely carefully, register
declarations actually degrade performance.  You should avoid using
them, but the existing Nachos code does use them so you need to
recognize what they are.
<code><pre>
    register int i;
</pre></code>
<P>

<h3>Classes</h3>
Both Java and C++ use classes as an abstraction mechanism.  Classes
encapsulate data and methods that operate on that data.  In C++, a
class definition is 
broken into a declaration and a separate definition of each member.
It is not
possible to attach the keywords <code>public</code>, <code>private</code>, or
<code>protected</code> to the class definition.
If the class is public (in the Java sense), you put its declaration in
a separate file whose name ends in <code>.h</code>, while you put the
definitions in a file with the same name but ending in
<code>.cc</code>.  A class declaration can be broken into three
sections:  a public section, a protected section, and a private
section.  Instead of attaching these keywords to each class member as
in Java, you put the member in the appropriate section of the
declaration as follows:
<code><pre>
    class Pair {
      public:
        Pair (int x, int y);  // The constructor
        int getX();
        int getY();
        void setX (int newValue);
        void setY (int newValue);
    
      private:
        int x;
        int y;
    };
</pre></code><P>
Also, note that C++ requires a semicolon at the end of a class declaration.<P>

To make this class visible to another file, it must be included in
that file:  
<code><pre>
    #include &lt;pair.h&gt;
</pre></code>
There is no notion of a
package in C++.<P>

The member definitions appear in the .cc file as mentioned earlier.
Since they appear outside the class declaration, each definition needs
to declare which class it is in using the <code>class_name::</code>
syntax:
<code><pre>
    int Pair::getX() {
      return x;
    }
</pre></code>
Any members that are fully defined in the class declaration (usually
just the variables) should not be defined in the .cc file.
<P>


If you want to specify a particular member from a specific class when
doing a function call, for example, you
use the <code>::</code> operator, as in 
<code><pre>
    SomeClass::someMethod()
</pre></code><P>

The syntax for declaring a subclass is different in C++:
<code><pre>
    class SubClass : public SuperClass {
      ...
    };
</pre></code>
This is a declaration of <code>SubClass</code> as a subclass of
<code>SuperClass</code>.  Note that you must include the keyword
<code>public</code> before the superclass name.  If you want to allow
a method to be overridden in a subclass, the superclass must include
the keyword <code>virtual</code> in its declaration of the method.
Any class that contains a virtual function and no definition of that
function is implicitly abstract.  It is not possible to declare a
class or function to be abstract.  There is no equivalent to Java's
<code>super</code> keyword.  If you want to refer to a superclass
function that is 
overridden in a subclass, you must explicitly qualify the function
name with the class from which it is inherited using the :: syntax.
Nachos does not use
subclasses, so I will not go further into their details.<P>

<h3>Input and Output</h3>
There are several ways to do input and output in C++.  The first way
is a hold-over from C.  Here, you use the function <code>printf</code>
to produce output.  <code>printf</code> takes a variable number of
arguments.  The first argument is a string.  The string may have
embedded within it zero or more control sequences.  For each control
sequence there must be an additional parameter to printf defining the
value to use for that control sequence.  The control sequences used in
Nachos are the following:
<table>
<tr><td>%c
<td>A character.
<tr><td>%s
<td>String
<tr><td>%d
<td>An integer to be represented as a decimal string.
<tr><td>%x
<td>An integer to be represented as a hexadecimal string.
<tr><td>%f
<td>A floating point number.
</table>
An important variant of <code>printf</code> is <code>fprintf</code>.
<code>fprintf</code> is like <code>printf</code>, except that it takes
an additional argument before the string which is a file pointer (for
an already-open file).  The output is written to the file instead of
to standard output.  Here is an example use of <code>printf</code>:
<code><pre>
    char month [4];
    int year;
    strcpy (month, "Sep");  // Assigns a value to a string variable.
    year = 1998;
    printf ("The month is %s.  The year is %d.\n", month, year);
</pre></code><P>

<code>scanf</code> is the input function from C.  Its format is
similar to <code>printf</code>.  The first argument is a string often
exclusively consisting of control sequences and whitespace.  For each
control sequence, there must be an argument that is the <a
href="#pointer">address</a> of a 
variable of the appropriate type.  The <a href="#malloc">memory must
be allocated</a> 
already.  <code>sscanf</code> is a variant of <code>scanf</code> with
an additional first argument representing a string to parse instead of
reading from standard input.  <code>sscanf</code> is typically used to
convert a string to an integer.  For example, suppose <code>s</code>
contains "1998", 
<code><pre>
    sscanf(s, "%d", &year);
</pre></code> will set
<code>year</code> equal to the integer 1998.<P>

The second (and preferred) way of doing input and output in C++ is
using streams.  Nachos uses <code>printf</code> rather than streams so
I will not discuss them here.<P>

(There is also an <code>sprintf</code> function that places the
formatted output in a string and a <code>fscanf</code> function that
reads input from a file, but these are not used in Nachos.)<P>

<h2>Features Unique to C++ (and used in Nachos)</h2>

<h3>Constants and Macros</h3>
To declare a constant in C++, you use <code>#define</code> (not
<code>final</code>) as in:
<code><pre>
    #define MAX_SIZE 10
</pre></code>
<code>#define</code> is actually
a much more powerful macro mechanism.  Everywhere that the
defined name appears within the scope, it is replaced by the
definition, which could be a complex expression.  This is often done
to make a statement look like a function call but without having the
runtime overhead of making a procedure call.  For example, here is a
macro defining minimum:
<code><pre>
    #define min(a,b)  (((a) < (b)) ? (a) : (b))
</pre></code>
Later in the code, the programmer can say:  
<code><pre>
    min (i, 4)
</pre></code>
and it is expanded by the preprocessor to the compiler to:
<code><pre>
    (((i) < (4)) ? (i) : (4))
</pre></code>
Since this is done by the
preprocessor, the expression is inlined rather than being executed as
a function call.<P>

<h3>Compiler Directives</h3>
<code>#define</code> is one example of a compiler preprocessor
command.  This is a command that is executed by a preprocessor that
scans the code prior to compilation.  The preprocessor is run
automatically when you run the compiler.  Two other common directives in
C++ are <code>#ifdef</code> and <code>#ifndef</code>.
<code>#ifdef</code> takes a variable name for its condition.  If that
variable name is defined, it evaluates to true and its body is
included in the source code that is compiled.  <code>#ifndef</code> is
similar but includes its body if
the variable is not defined.  Both may have <code>#else</code>
clauses.  They both end with the delimiter <code>#endif</code>.
<code><pre>
    #ifdef HOST_SPARC
    #include <sys/time.h>
    #endif
</pre></code>

This is how C++ programmers typically port programs between
architectures.  Architecture-dependent code is placed inside #ifdef
statements.  When the code is compiled, the appropriate variable is
set for the architecture allowing the correct code to be compiled in.
Unlike normal if-statements, these if-statements are evaluated at
compilation time.  The branch that is true at compilation time is
compiled into the program.  Branches that are false are not compiled
in.  The condition is not tested at runtime.

<h3>Life Outside of a Class</h3>

In Java, everything is declared inside of a class.  Since C++ needed
to maintain backwards compatibility with C, this is not true
for C++.  Data types, variables, and functions can all be declared
outside of classes.  These are referred to by simply using their
names.  There is no <code>.</code> syntax required to dereference
them. <P>

If a data declaration is to be global and shared between multiple
files, it is declared in one file and declared to be an
<code>extern</code> in the other files.  We all know that global
variables are bad, so we shouldn't do this, however, there are some
instances of this in Nachos.  A function can also be declared this
way, but it is better style to put the function declaration in a .h
file and #<code>include</code> the .h file rather than
<code>extern</code> the function declaration.  Occasionally you will see
something like:
<pre><code>
    extern "C" {
      &lt;a list of declarations&gt;
    }
</pre></code>
This indicates that the list of declarations has been declared as pure C
code, not within classes.<P>

The main program for a C++ program is called <code>main</code>, but it
is declared externally to any class.  Its signature is:
<code><pre>
void main (int argc, char **argv);
</pre></code><P>
The first parameter is the number of arguments.  The second parameter
is an <a href="#stringarray">array of strings</a>, each string
containing one command-line argument.<P> 

<h3>Struct Types</h2>

Data type declarations outside of classes are encapsulated inside a
<code>struct</code>:
<code><pre>
struct Date {
  char *month;
  int date;
  int year;
};

struct Date someDate;
</pre></code><P>

Typically, when declaring a type one gives the type a name.  Oddly
enough, creates a type named 
<code>struct Date</code>.  To give it a simple type name, a slightly
different syntax is 
required:
<code><pre>
typedef struct {
  char *month;
  int date;
  int year;
} Date;

Date someDate;
</pre></code><P>

You will find quite a few struct declarations in existing Nachos
code.  <strong>You should not create any new struct declarations.  Instead you
should create a class whose data members are like the struct fields.</strong><P>

<h3>Enumerated Types</h3>
Enumerated types is one feature of C++ that I really wish Java had.
With an enumerated type, you can define your own type with its own set
of discrete values.  For example,
<code><pre>
enum PrimaryColor {
  red,
  yellow,
  blue
};

PrimaryColor c;
c = red;
</pre></code>
<P>
Enumerated types are simulated in Java in the following way.  The
programmer declares a class (equivalent to the 
enumerated type) that provides a number of public constants
(representing the enumerated values).

<h3>Union Types</h3>
A union type is a type that allows a particular piece of memory to
store a value of different types at different types (a primitive
precursor to subtyping).  A union declaration looks a lot like a
struct declaration:
<code><pre>
union String_or_int {
  char *someString;
  int someInt;
};
</pre></code>
The union itself does not keep track of which type is in it, so a
union is typically used inside a struct where a second field of the
struct remembers the type currently in the union field:
<code><pre>
struct S_or_i {
  bool containsInt;
  union String_or_int x;
};
</pre></code>



<a name="pointer">
<h3>Pointers</h3>
In Java, all references to objects are pointers to objects.  All
references to primitive types, like <code>int</code> are values.  In
C++, using a type name always means that the variable will have a
value of that type.  It is possible to introduce pointers to values
explicitly and also to create types whose values are pointers to other
values.  Suppose we have a <code>Date</code> type, here is how we
would declare a variable that is to contain a pointer to a date and
also a type to represent a pointer to a date:
<code><pre>
    Date *someDate;         // Variable containing a pointer to a Date

    typedef Date *DatePtr;  // Type defining a pointer to a Date
    DatePtr date2;	    // Variable containing a pointer to a Date

    date2 = someDate;
</pre></code>
<code>someDate</code> and <code>date2</code> both contain pointers to
dates.  The assignment 
statement results in both variables pointing to the same memory
location and therefore sharing the same value as happens in Java.
<P>

Contrast the above with the following similar code that does not use
pointers:
<code><pre>
    Date someDate;
    Date date2;

    date2 = someDate;
</pre></code>
Assuming that <code>Date</code> is simply a struct type, not a pointer
type, the 
assignment statement above <em>copies</em> the value from
<code>someDate</code> to <code>date2</code>.  If the value referenced
in either variable is changed, it has no effect on the other value.
In Java, you would need to explicitly clone the value to have this
effect.  Unless you know the definition of the type involved in an
assignment, you cannot tell whether the assignment results in
value-sharing or value-copying.<P> 

A pointer is dereferenced using the <code>-&gt;</code> syntax:
<code><pre>
    some_pointer->some_field
</pre></code>
In C++, <code>this</code> is a
pointer, not an object.  To dereference it, you must say
<code><pre>
    this-&gt;member
</pre></code><P> 

To get a pointer to an object, you use the <code>&</code> operator:
<code><pre>
    int *IntPtr;
    int anInt;

    anInt = 1;
    intPtr = &anInt;
</pre></code><P>

To get the value pointed to by a pointer, you use the <code>*</code>
operator:
<code><pre>
    int *intPtr;
    int anInt, int2;

    anInt = 1;
    intPtr = &anInt;
    int2 = *anInt;
</pre></code><P>

In C++, all parameters are passed by value.  If you want to be able to
change the value of a parameter as a side effect, you must declare the
parameter type to be a pointer and you must pass in the address of the
variable that you want to change:
<code><pre>
    void increment (int * anInt) {
        (*anInt)++;
    }

    int i = 0;
    increment (&i);
</pre></code>
<P>

If you want to pass a pointer or an array to a function, but you do
not want the object to be changed, you can say that the parameter type
is <code>const</code>:
<code><pre>
    void doSomething (const int * anInt) {...}

    int *intArray;
    // Assume the array has been allocated and given memory.
    doSomething (intArray);
</pre></code><P>


<a name="malloc">
<h3>Memory Management</h3>
Java is a garbage-collected language.  C++ is not.  In Java, memory is
allocated for an object when that object is constructed.  The memory
is deallocated when there are no more references to that object.<P>

In C++, objects (and structs) can either be automatic or manually
allocated.  Variables whose types are not pointers (such as classes or
structs) are automatically allocated and deallocated.  They are
allocated when they are declared and deallocated at the end of the
block in which they are declared.  You do not use <code>new</code> to
allocate a variable whose type is a class.<P>

With pointer types, the programmer must explicitly allocate and
deallocate memory.  You must allocate memory before assigning a value
to the object.  Allocation for classes is done with <code>new</code>
as in Java.  You should deallocate an object when you believe there are no more
references to that object.  The syntax for
deallocating an object is:  
<code><pre>
    delete list1;
</pre></code> where <code>list1</code>
is the name of an object.  If you have an array of objects, and you
want to delete all the objects in the array, say 
<code><pre>
    delete [] objArray;
</pre></code>
<strong>For every object allocated
with <code>new</code> there should be a deallocation with
<code>delete</code>.</strong><P>

If you want to do anything special when
deleting an object, you must define a deconstructor for the object's
class.  A typical thing to do is to delete the objects referenced by
the object being deleted (if you are sure they are the last reference
to that object!).  The syntax for declaring a deconstructor is:
<code><pre>
    ~MyClass();
</pre></code> where <code>MyClass</code> is the name of the
class containing the deconstructor.
<P>

You must also allocate memory for pointers to other types (typically
structs), as in C.  To do this you need to
know how big the structure is.  You can find this out using the
<code>sizeof</code> function:  
<code><pre>
    sizeof (some_type)
</pre></code>
The syntax for
allocating memory is:  
<code><pre>
    some_struct = (some_struct_type *) malloc (sizeof (some_struct_type));
</pre></code>
To free such memory, you
use the <code>free</code> function:  
<code><pre>
free (some_struct);
</pre></code>
<strong>For every variable allocated
with <code>malloc</code> there should be a deallocation with
<code>free</code>.</strong><P>

<h3>Similarity between Arrays and Pointers</h3>
Suppose you want to have an array variable, but you do not know how
big the array should be.  Since C++ requires you to declare the size
of the array when you declare the array variable, you cannot declare
it to be an array.  Instead you must declare a pointer to the desired
element type and later allocate the appropriate amount of memory yourself:
<code><pre>
    int *intArray;
    intArray = (int *) calloc (10, sizeof (int));
</pre></code>
Even though you declared the variable to be a pointer, you can still
dereference it as an array!  <P>

There is no string data type in C++.  Strings are simply arrays of
characters.  Since we typically want to allow variable length strings,
string variables are typically declared to be pointers to characters:
<code><pre>
    char *someString;
</pre></code>
All strings must have the special null character '\0' as their last
character to identify the end of the string since there is no length
recorded with the string.  String constants are enclosed in "" and
implicitly end in the null character.  Since strings are pointers,
assigning one string variable to another results in the two variables
pointing to the same piece of memory.  Changing one string changes the
other.  If you do not want this effect, you must use the strcpy
function:
<code><pre>
    char *month;	    // Declare the string
    month = malloc (4);     // Allocate memory for string ending in null.
    strcpy (month, "Sep");  // Assigns a value to a string variable.
</pre></code>
<strong>Remember to free the string when it is no longer being used.</strong><P>

<a name="stringarray">
In a few places in Nachos, you will syntax like the following:
<code><pre>
    char **stringArray;
</pre></code> This is a pointer to a pointer of
a character.  Keeping in mind that pointer declarations often mean
variably-sized arrays, this syntax represents a variably-sized array
of strings (since <code>char *</code> means string).<P>


<h2>Features Unique to Java</h2>
There is no equivalent to the <code>synchronized</code> keyword.
Threads are also not built into the language.
When we discuss
threads and synchronization in class, we will discuss how this is done
in C++.<P>

There is no equivalent of JavaDoc for C++.<P>

C++ does not have an <code>instanceof</code> operator.<P>

C++ does not have interfaces.<P>

C++ does not come with a large standardized class library as Java
does.  If you look at sections 2 and 3 of the Unix manual (using
<code>xman</code>), however, 
you will see a large collection of C functions that can be called from
C++.  Some of the common functions, like <code>strcpy</code>, are the same across
all operating systems, but, in general, the routines in these
libraries are not standardized (particularly the system calls in
section 2).  You need to explore the man pages to
see what is on the particular operating system you are using.  This
lack of standardization is one reason that C++ programs are not as
portable as Java ones.

<hr>
<address>
Last modified by <a href="mailto:lerner@cs.umass.edu">Barbara
Lerner</a> on September 11, 1998.
</address>
</body>
</html>
